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Generalized Hultman Numbers and Cycle Structures of 

Breakpoint Graphs 
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Abstract 

Genome rearrangements can be modeled as fc-breaks, which break a genome at k 
positions and glue the resulting fragments in a new order. In particular, reversals, 
translocations, fusions, and fissions are modeled as 2-breaks, and transpositions are 
modeled as 3-breaks. While /c-break rearrangements for k > 3 have not been observed 
in evolution, they are used in cancer genomics to model chromothripsis, a catastrophic 
event of multiple breakages happening simultaneously in a genome. It is known that 
the fc-break distance between two genomes (i.e., the minimum number of /c-breaks 
required to transform one genome into the other) can be computed in terms of cycle 
lengths in the breakpoint graph of these genomes. 

In the current work, we address the combinatorial problem of enumerating genomes 
at a given /c-break distance from a fixed unichromosomal genome. More generally, we 
enumerate genome pairs, whose breakpoint graph has a given distribution of cycle 
lengths. We further show how our enumeration can be used for uniform sampling of 
random genomes at a given fe-break distance, and describe its connection to various 
combinatorial objects such as Bell polynomials. 


1 Introduction 


Genome rearrangements are evolntionary events that chang e gene order along the genome . 
The genome rearrangements can be modeled as k-breaks (lAlekseyev and Pevznerl . 120081 ). 
which break a genome at k positions and glne the resnlting fragments in a new order. 
While most frequent genome rearrangements such as reversals (which flip segments of a 
chromosome), translocations (which exchange segments of two chromosomes), fusions (which 
merge two chromosomes into one), and fissions (which split a single chr omosome into two) 


can b e modeled as 2-breaks (also called Double-Cut-and-Join or DCJ in lYancoponlos et ah 


20051 ). more complex and rare genome rearrangements such as transpositions are modeled 


as 3-breaks. While fc-break rearrangements for k > 3 have not been observed in evolution, 
they are used in cancer genomics to model chromoth riysis, a catastrophic event of multiple 


break ages happening simultaneously in the genome fIStephens et al.l. l2011t IWeinreb et ah 


2014 ). 


*The George Washington University, Washington, DC, USA 
1st. Petersburg State University, St. Petersburg, Russia 
1 Corresponding author. Email: nikita_alexeev@gwu.edu 


1 














The k-break distance between two genomes is defined as the minimum number of k- 
breaks required to transform one genome into the other. The 2-break (DCJ) distance is 
often used in phylogenomic studies to estimate the evolutionary remoteness of genomes. The 
fc-break distance between two genomes can be expressed in terms of cycles in the breakpoint 
graph of these genomes. Namely, while the 2-break distance depends only on the number of 
cycles in this graph, the fc-break distan ce in general depends on the distribution of the cycle 
lengths flAleksevev and Pevznerl . l2008h . 

In the current work, we address the combinatorial enumeration of genomes at a given 
fc-break distance from a fixed unichromosomal genome. More generally, for a fixed unichro- 
mosomal genome P, we enumerate all genomes Q such that the breakpoint graph of P and Q 
has a given distribution of cycle lengths. We consider various flavors of this problem, where 
genes may be arbitrarily oriented or co-oriented along the genomesji] while the genomes 
Q may be unichromosomal or multichromosomal. In the multichromosomal case we restrict 
genomes to contain only circular chromosomes, while in the unichromosomal case we consider 
both circular and linear genomes. 

Previous studies are mostly concerned with 2-break distances between unichromosomal 
genomes. In particular, unichromosomal genomes with co-oriented genes can be interpreted 
as permutations, and the number of permutations at a given 2-b re ak distance from the 
i denti ty permutation is given by Hultman number s flHultmanl. 1199911. iDoignon and Labarre 
(120071) gave a closed formula for Hultman numbers, iBona and Flvnnl (120091) proved a relation 
between Hultman numbers and Stirling numbers of the first kind. The case of 2-break dis¬ 
tances between genomes with arbitrarily oriented genes was solved by iGrusea and Labarre 
(l2013h. The asymptotic distribution of 2-break distances was proved to be normal by 


Alexeev and Zografl (1201411 . The analog of Hu l tman numbers for multichromosomal circular 

(1201411 . The current work generalizes all these 


genomes was recently studied by iFeiiao et al 
results. 


2 Background 


We start our analysis with (multichromosomal) circular genomes and later extend it to 
unichromosomal linear genomes. 

We represent a circular genome consisting of genes {1, 2,..., u} as a genome graph. This 
graph contains 2n vertices: for each gene i G {l,2,...,n}, there are the tail and head 
vertices i*' and The graph has n directed gene edges of the form encoding n genes, 

and n undirected adjacency edges connecting neighboring head/tail vertices of adjacent genes 
(Fig.[T^). We remark that for the genomes with co-oriented genes all adjacency edges connect 
the head of one gene with the tail of another. 

Let P and Q be a pair of circular genomes on the same genes {1,2,. ..,n}. We assume 
that in their genome graphs the adjacency edges of P are colored black and the adjacency 
edges of Q are colored gray. The breakpoint graph G{P, Q) is defined on the set of vertices 
I i = 1,... ,n} with black and gray edges inherited from genome graphs of P and 
Q (Fig. [Dd). Since each vertex in G{P,Q) has degree 2, the black and gray edges form a 


^The case of unoriented genes is presumably much harder. For example, compu ting the reversa l distance 
between unichromosomal genomes with unoriented genes is known to be NP-hard (jCapraral Il997f ). 
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(a) (b) 




Figure 1: For genomes P = (1, 2, 3,4, 5, 6) and Q = (1, —3)(2, —6)(4, —5), (a) the genome 
graph of Q; (b) the breakpoint graph G{P,Q), where the adjacency edges of P and Q are 
colored black (solid) and gray (dashed), respectively. The graph G{P,Q) consists of one 
2-cycle and one 4-cycle. 


collection of alternating black-gray cycles. We say that a black-gray cycle is an i-cycle if it 
is composed of i black and i gray edges. Let q(P, Q) be the number of Pcycles in G{P, Q). 
Then the total number of black edges in G{P, Q) equals 

'^i-ce{P,Q) = n. 

£>1 


A fc-break in genome Q corresponds to an operation in its genome graph and the break¬ 
point graph G{P,Q). Namely, a fc-break replaces any fc-tuple of gray edges with another 
fc-tuple of gray edges forming a matching on the same set of 2k vertices (Fig. |2]). A transfor¬ 
mation of genome Q into genome P with fc-breaks can therefore be viewed as a transformation 
of the breakpoint graph G{P, Q) into the breakpoint graph G{P, P) with fc-breaks on gray 
edges. The k-break distance dk{P, Q) between genomes P and Q is the minimum number of 
fc-breaks in such a transformation. 


_ The 2-break distance between genomes P and Q is given by the following formula (lYancopoulos et ah 

2nn5h : 

d 2 iP,Q) = n-c{P,Q), (1) 

where c{P,Q) = total number of cycles in G{P,Q). Formulae for the 

k-hreak distance for A: > 2 are more sophisticated. In particul ar, d^i^P^Q) and d^i^P^Q) are 
given by the following formulae f Aleksevev and Pevznerl . 20081) : 


4(P,Q) = 


U-c2-nP,Q) 


( 2 ) 


d^{P,Q) = 


n - c3’i(P, Q) - [c3’2(P,g)/2J 


(3) 
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G(P.Qi) 


G(P. Q2) 


Figure 2: A 3-break transforming genome Qi = (1,—3,5,2,—4,6) into genome Q 2 = 
(1, 5, 2,—3,—4, 6) corresponds to a transformation of the breakpoint graph G{P,Qi) into 
G(P, Q 2 ) by replacing the gray edges {1^,3^}, {2^,4^}, and {3^5*} with the gray edges 
{1^,5*}, {2'*,3^}, and {3*,4'*}. 


where 

£=i (mod m) 

For a hxed unichromosomal genome P with n genes and a given vector (ci, C 2 , C 3 ,..., c„) 
of nonnegative integers such that 'Y^=i (- ■ = n, we will compute the number of genomes 

Q such that G{P, Q) consists of q Pcycles (for each £ G {1, 2,... As an application, this 
enumeration will allow us to hnd the distribution of fc-break distances from various genomes 
Q to a hxed genome P for any k >2. 

3 Genomes With A Fixed Breakpoint Graph 

Let c = (ci, C 2 , C 3 ,...) be a sequence of nonnegative integers with a hnite number of nonzero 
(i.e., strictly positive) terms. Then L(c) = J2e>i^ • q is a hnite integer. We say that a 
breakpoint graph has cycle structure c if for every positive integer the number of Pcycles 
in this graph equals q. 

Let F be a hxed unichromosomal genome with n genes and Qn{h]c) be the set of h- 
chromosomal genomes Q on the same n genes such that G{P, Q) has cycle structure cj^ 
Let Mn{h;c) be the cardinality of Q„(h;c), i.e., M„(h;c) = |Q„(h;c)|. Clearly, we have 

^We remark that the genome P essentially corresponds to a cyclic order on the genes of genomes Q. Hence, 
Qn{h;c) is well-defined as soon as we are given a cyclic order on the genes. Without loss of generality, we 
may assume that the genes are labeled by numbers from 1 to n (up to a cyclic rotation). 


4 







Mn{h;c) = 0 unless L(c) = njf| We remark that M„(h;c) does not depend on the order of 
genes in P but only on their quantity. 

The generating function of numbers Mn{h] c) is dehned by 

OO OO 

F{x]U]Si,S2,. ..) = JJsf 

c h=l 2=1 

OO OO OO 

21=1 c:L(c)=n h=l i=l 


We remark that F{x] u] si, S 2 , ■ ■ ■) at x = 0 equals si (which corresponds to G{P, Q), where 
Q = P consists of n = 1 gene), while at u = 0 it enumerates breakpoint graphs G{P, Q) for 
unichromosomal genomes Q. 

Theorem 3.1. The following equation, together with the initial condition F(0; u; si, S 2 ,...) = 
si, uniquely determines the generating function F(x; u; si, S 2 , ■ ■ ■)■ 


dF 

dx 


OO 2 — 1 

i=2 j=l 


dF 

dsi_i 


+ 

i=2 


dF 

dsi—i 


OO 2 — 1 


2 

i=2 j=l 


j)Si+l 


d'^F 


dsjdsi-j 


^ . dF 

+ '“E“w^- 


2 = 1 


(4) 


Proof. The theorem statement follows from Lemma [3]2] below, which essentially restates the 
equation (| 1 ]) as equalities of the coefficients of ni>i and right-hand 

sides of (jl]). Furthermore, these equalities uniquely determine the values of all M„(h; c) by 
induction on n, thus determining F(x; u; Si,S 2 , ■ ■ ■)■ □ 

Lemma 3.2. For any positive integers n, h, we have (the initial condition) 

1 0, otherwise; 


^In fact, everywhere below in M„(/i;c) we have n = Lie), making the index n redundant. However, we 
find beneficial to have it as a “checksum” for c. 
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and for all n > 1 


{n - l)Mn{h]c) 

CO i—1 

^ ^ ^ 1)(Q—1 H“ 1 C “1“ 1 j) (5) 

z=2 j=l 
oo 

+ — l)^(cj_i + l)M„_i(/i; c + ej_i — e*) (6) 

i=2 

OO 2—1 

i “1“ 1 “1“ ^ Ci+1 “1“ Cj “1“ ^i—j) i'^) 

i=2 j=l 
oo 

+ ~ l)(ci-i + l)^n-i(^ — 1; c + Cj-i — Ci), (8) 

i=2 

where 6 ij is the Kronecker delta, and = (5,4, 6 i^ 2 , ■ ■ ■) is a unit vector (where all coordinates 
except the i-th are zero)^ 

Proof. We prove the lemma statement using double counting!^ 

Let 77. > 1, Q G Qn{h] c), and I G {1,..., n — 1}. We remove gene I from both genomes P 
and Q to obtain new genomes P' and Q' on n — 1 genes. Then the breakpoint graph G{P', Q') 
can be obtained from G{P, Q) by removal of vertices and l^ and incident gray edges {/*, a}, 
{l^, c} and black edges {/*, b}, d}, and addition of a new gray edge {a, c} (unless a = l^ 

and b = /*) and a new black edge {b, d} (Fig. |3]). Clearly, in G{P, Q) vertices a, b belong to 
the same black-gray cycle and so do vertices c, d. Similarly, in G{P', Q') vertices b, d belong 
to the same black-gray cycle and so do vertices a, b (if present). 

Below we analyze how the cycle structure of G{P', Q') may differ from the cycle structure 
of G{P, Q). There are four cases to consider: 

Case 1. Vertices a, b belong to a different cycle in G{P, Q) than vertices c and d. If these 
cycles are a j-cycle and a {i — j)-cycle {i > j > 1), respectively, then Q' G Q„_i(h; c-|-ej_i — 
Sj — Bi-j) and vertices a, b, c, d belong to the same {i — l)-cycle in G{P', Q'). 

Case 2. Vertices a, b, c, d belong to the same {i -|- l)-cycle {i > 2) in G{P, Q) and their 
order is (a, /*, 6,..., c, d,.. .). In this case, Q' G Qn-i{h; c + et —Cj+i) and vertices a, b, c, d 
belong to the same f-cycle in G{P, Q). 


Case 3. Vertices a, b, c, d belong to the same (i-f l)-cycle {i > 2) in G{P, Q) and their order 
is (a, l^,b,... ,d,l^,c,...). In this case, Q' G Qn-i{h; c — Cj+i Pcj -|-ej_j), edge {a, c} belongs 
to some j-cycle (1 < j < i) in G{P',Q'), and edge {b,d} belongs to some (z — j)-cycle in 
G{P',Q'). 


"‘We remark that in (IS])-® all indices of M are in agreement with the corresponding cycle structure, i.e., 
L(c) = n and each of L{c + ei_i — Cj — L{c + d-i — Bi), L(c — Ci +i -|- e^- -|- e,:-d equal s n — 1. 


similar technique for a different enumeration problem was used in I Alexeev et al.1 ( 2016l l. 
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(a) 



G{P,Q) G{P',Q') 


Figure 3: A transformation of breakpoint graphs corresponding to removal of gene I from 
genomes P and Q resulting in genomes P' and Q'. (a) The graph G{P, Q) has no gray edge 
i.e., a ^ and c ^ (b) The graph G{P,Q) contains the gray edge i.e., 

a = and c = 


Case 4. Vertices a, c coincide with i.e., a = and c = This means that gene 

I forms its own chromosome in Q and this chromosome is removed in Q' (Fig. |3 ]d). In this 
case, vertices a, b, c, d belong to the same i-cycle in G{P, Q) for some i > 2, and b, d belong 
to an (z — l)-cycle in G{P\ Q'). Hence, Q' G Q„_i(h — 1; c + ej_i — Cj). 

We dehne a function F^, which maps a genome Q to a pair {Q', (a,c)) (Cases 1-3) or 
a genome Q' (Case 4), where (a, c) is an ordered pair of vertices corresponding to a gray 
edge in G{P',Q'). For any integers n > 1 and I G {1,... ,n — 1}, we will prove that F; is 
a bijection between (i) the h-chromosomal genomes Q on n genes; and (ii) the union of the 
h-chromosomal genomes Q' on n — 1 genes with a marked gray edge in G{P', Q') and the 

®We remark that a similarly looking case b = P and d = d is not possible, since P is a unichromosomal 
genome with n > 1 genes. 
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{h — l)-chromosomal genomes on n — 1 genes. Namely, we will show that Ti is invertible. 
Indeed, given an h-chromosomal genome Q' on genes {1, 2,..., n — 1} and a pair (a, c), we 
relabel the genes consecutively into {1,2,...,/ — !,/ + !,...,n}. To reconstruct a genome 
Q from Q', we insert gene / in between of the genes corresponding to vertices a and c (in 
the direction from a to c). Similarly, given an {h — l)-chromosomal genome Q' on genes 
{l,2,...,n — 1}, we relabel its genes and construct genome Q from Q' by adding a new 
chromosome consisting of a single gene /. 

To obtain a formula Mn{h]c) for given integer h > 1 and cycle structure c (with n = 
L(c)), we restrict functions T; to the genomes Q G Qnih] c). Since there are n — 1 values of 
/, the total number of pairs (Q,r;(Q)) equals (n — l)Mn{h]c). Since each T; is a bijection, 
this amount also equals the sum of 

• number of pairs (r;“^(((5', (a, c))), (Q', (a, c))), where r;“^(((5', (a, c))) G Qn{h] c) axid Q' 
belongs to Qn-i{h] c+d-i-e j-Ci-j), Qn-i{h] c+e—ej+i), or Qn-i{h] c-d+i+e j+Ci-j) 
for some / > j > 1 (Cases 1,2,3, respectively); and 

• number of pairs (rj”^((5'), QO) where rj"^(Q') G Qn{h',c) and Q' G Qn-i{h — l;c + 
Cj-i — Si) (Case 4). 

We consider Cases 1 and 3 in details. 

In Case 1, for any given integers i > j > 1, we consider a genome Q' G Qn-i{h;c + 
ei_i — Cj — Ci^j) composed of genes (1, 2 ,..., n — 1} and enumerate the ways to reconstruct 
some genome Q G Qn{h; c) from Q'. First, we choose an (f — l)-cycle C in G{P', Q'), which 
can be done in Cj_i + 1 — 6 j^i — ways. Then we choose an integer / such that the 

black edge {(/ — 1)^,/*} belongs to C, which can be done in z — 1 ways. Then the cycle 
C has the form ((/ — 1)^, /*,..., c, a,...), where there are 2 i — 2j edges between vertices /* 
and c (and thus {a, c} represents a gray edge in C). Then we reconstruct a genome Q as 
Q = (a, c))). Summing over the values of i,j gives the term ([5]) for the total number 

of such genomes Q. 

In Case 3, for any given integers z > j > 1, we consider a genome Q' G Qn-i{h;c — 
Cj+i + Cj + Bi-j) composed of genes ( 1 , 2 ,...,zz — 1 } and enumerate the number of ways to 
reconstruct some genome Q G Qn{h]c) from Q'. First, we choose a j-cycle and an (z — ])- 
cycle in G{P', Q'), which can be done in {cj + l)(cj_j + 1 + Sj^i^j) ways. Then we choose a 
gray edge {u, n} in the j-cycle (in j ways) and choose an integer I such that the black edge 
{(/ — 1)^, /*} is in the (z — j)-cycle (in i — j ways). Then we reconstruct a genome Q in two 
ways: Q = F;“^((Q', {u,v))) and Q = {v,u))), which gives factor 2 . Summing over 

the values of i,j gives the term ([7]) for the total number of such genomes Q. 

Cases 2 and 4 follow similarly and deliver the terms ([ 6 ]) and ([ 8 |), respectively. □ 


4 Applications 

4.1 Hultman Numbers 

Let F be a hxed linear unichromosomal genome on n co-oriented genes and H{n,n + 1 — d) 
be the number of linear unichromosomal genomes Q on the same co-oriented genes such that 


the 2- break distance betwee n P and Q is d. The numbers H(n,m) are called Hu l tman num¬ 
bers f Pnigrion and Labarre . 2007 : Rbria arid FIvtiti . 2009 : Alexeev and Zogral . 2014fl and 
present in the OEIS ( The OEIS Foundation . 2016 1 as the sequence A164652. The prob¬ 
lem of enumerating linear unichromosomal genomes can be reduced to enumerating circular 
genomes as follows. One can add a virtual gene 0 to the genomes P and Q in between of the 
Erst and last genes on their chromosomes, making them circular. Then the 2-break distance 
between P and Q equals n -|- 1 — m, where m is the number of cycles in the (modihed) 
breakpoint graph G{P,Q). 

The Hultman numbers can be obtained from a modihcation of Theorem 13.11 Namely, 
let P be a hxed unichromosomal circular genome with genes {1,2 ,..., n} and let Q'^n{h] c) 
be the set of h-chromosomal circular genomes Q on the same co-oriented n genes such that 
G{P,Q) has cycle structure c. Denote the cardinality of Q^n(h; c) by M+(/i;c). 

The generating functions of numbers M+(h; c) is dehned by 


G'(x;m;si,S 2 ,...) = ^ ^ M+(h;c)JJ, 


n=l 


h=l 


c:L(c)=n 


2=1 


Theorem 4.1. The following equation, together with the initial condition G(0; m; Si, S 2 ,...) = 
Si, uniquely determines the generating function G(x; m; Si, S 2 , ■ ■ ■)■ 


dx 


OO 2—1 

i=2 j=l 
OO 2—1 

i=2 j=l 


dG 

d^G 

dsjdsi-j 


^ . dG 

2=1 


Proof. The proof is similar to the proof of Theorem 13. II and Lemma [3.21 except that genome 
Q here has to have co-oriented genes and thus there is no Case 2 and there is no factor 2 for 
Case 3. □ 


Let Fn{u; Si, S 2 ,...) and Gniu] Si, S 2 ,...) be the coefficients of in P(x; m; Si, S 2 , ■ ■ ■) 
and G{x; u; Si, S 2 ,...), respectively. The hrst few value^of Pn(0; Si, S 2 , ■ ■ ■) and Gn(0; Si, S 2 , ■ ■ ■) 
corresponding to unichromosomal genomes are listed below: 


Pi(0; Si, Si,...) = Si, 

-^2(0; Si, S2, • • •) ~ '®1 + -^ 2 , 

-^3(0; Si, S25 • • •) = '®1 + 3 SiS 2 + 4 s 3 , 

P4(0; Si, S2, • • •) = -f- 6 s^S2 + ( 5^2 + I6S1S3) -|- 2OS4, 

■^5(0; Si, S2, • • • ) = " 5 ^ -j- 10s^S2 + ( 40 s^S 3 -|- 25S1S2) + (IOOS1S4 -f- 6OS2S3) -f I48S5. 


^These values are computed with Mathematica code given in Appendix. 
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(71(0; Si, Si,...) = Si, 
(72(0; Si, S2,...) = 'Si5 
^3(0; Si, S2, ...) = 


( 74 ( 0 ; Si, S 2 , 
^ 5 ( 0 ; Sl, S 2 , 
^ 6 ( 0 ; Sl, S 2 , 


) = s^ + (4 siS3 + S 2 ), 

) = S^ + ( 10 s^S 3 + 5S1S2) + 8S5, 

) = s® + (20 s^S3 + 15s^S2) + (48siS5 + I 2 S 3 + 24 S 2 S 4 ). 


Taking Sj = s for alH = 1, 2,..., we get 


n+l 


(7„(0; s, s, ...) = X] H{n - 1, m)i 


m=l 


In particular, we obtain the following formula for Hultman numbers: 

H{n-l,m)= ^ M+(l;c), 


CGCn 


wh ere Cn m. = \c ■. L(c) = n an d ~ 


Grusea and Labarrel (j2013[l introduced the problem of enumerating linear unichromoso- 


mal genomes, where genes may be arbitrarily oriented. The corresponding signed Hultman 
numbers H^{n,m) form the sequence A189507 in the OEIS. Theorem 13 .1 1 allows us to com¬ 
pute these numbers as follows: 




( 9 ) 


cec„ 


The hrst few numbers H{n,m) and H^{n,m) are listed in Tabled! 


Table 1: Values of Hultman numbers. 


(a) Values of H{n,m). 


n\m 

1 2 3 4 5 6 

0 

1 

1 

0 1 

2 

1 0 1 

3 

0 5 0 1 

4 

8 0 15 01 

5 

0 84 0 35 0 1 


(b) Values of H^{n,m). 


n\m 

1 2 3 4 5 6 

0 

1 

1 

1 1 

2 

4 3 1 

3 

20 21 6 1 

4 

148 160 65 10 1 

5 

1348 1620 701 155 15 1 


4.2 Bell Polynomials 

The numbers Mn{h; c) have multiple connections to well-known combinatorial objects. Some 
of these connections are straightforward, and some appear to be new. 
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It is easy to see that in the unichromosomal case, G„(0; 1,1,...) enumerates permutations 
of order n — 1, and so 

= (n-1)!. 

Similarly, F„(0; 1,1,...) enumerates signed permutations of order n — 1, and so 

F 40 ;l,l,...) = 2”-i(n-l)!. 


In the multichromosomal case, we get more general formulae: 


h=l 


U 


h-1 


and 

n 

1 , 1 ,...) = 5 ^ 2 ”-'' 

h=l 

where are unsigned Stirling numbers of the hrst kind (A094638 in the OEIS). Moreover, 
for M = 1, we have 



Gni)-] Sl, S2i • • •) 


E 

c:L(c)=n 




i=l 


( 10 ) 


The numbers L(c)!/(ci!l'^^C2!2'^2 ...) enumerate permutations with the cycle structure c and 
form the sequence A124795 in the OEIS. The functions (jnf l; Si, S 2 ,...) are closely related 
to the complete exponential Bell polynomials (IComtetl . Il974 Section 3.3) 


y;(a;i,a; 2 ,...) 


E 

c:L(c)=n 




i=l 


Namely, from flTOll and ffTTll it follows that 




-Sl S 2 S 3 




0 !’ 1!’ 2 !’'"’ {k-l)V 


f4(si, S 2 , ...). 


( 11 ) 


Hence, Theorem 14.11 implies the following (apparently new) differential equation for Bell 
polynomials: 


00 2—1 


{n - l)Yn{xi,X 2 ,...) = - 1) ( _ 

i=2 j=l 


i-2 


dYn-i 




j - iy"""‘ " dxi_i 


00 2—1 


+ EE 


Xj+I d^Yn-l 
(■) dxjdxi.j 


+ J 2 xi 

i=2 


dY^_, 

dxi-i ' 
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4.3 Distribution Of /c-Break Distances 

Let d) be the number of h-chromosomal circular genomes with n genes at the /c-break 

distance d from a hxed unichromosomal circular genome. For k = 2 and h = 1, these 
numbers represent signed Hultman numbers: H^in, d) = H^{n — l,n — d). 

Using formulae ([2]) and ([3]), we can further obtain H^{n,d) and H^{n,d). The hrst few 
numbers Hl{n,d), H^{n,d), Hl{n,d), and Hl{n,d) are listed in Tabled 


Table 2: Values of generalized Hultman numbers. 


(a) Values of H^{n,d). 


(b) Values of H^{n,d). 


n\d 

1 

2 

3 

1 

0 

0 

0 

2 

1 

0 

0 

3 

6 

0 

0 

4 

18 

26 

0 

5 

40 

360 

0 

6 

75 

2034 

2275 

7 

126 

7588 

48734 


n\d 

0 

1 

2 

3 

1 

1 

0 

0 

0 

2 

1 

1 

0 

0 

3 

1 

7 

0 

0 

4 

1 

22 

25 

0 

5 

1 

50 

333 

0 

6 

1 

95 

1851 

1893 

7 

1 

161 

6839 

39079 


(c) Values of Hl{n,d). (d) Values of Hl{n,d). 


n\d 

1 

2 

3 

1 

0 

0 

0 

2 

1 

0 

0 

3 

6 

0 

0 

4 

44 

0 

0 

5 

170 

230 

0 

6 

465 

3919 

0 

7 

1036 

55412 

0 

8 

2016 

396764 

437572 


n\d 

0 

1 

2 

3 

1 

1 

0 

0 

0 

2 

1 

1 

0 

0 

3 

1 

7 

0 

0 

4 

1 

47 

0 

0 

5 

1 

175 

208 

0 

6 

1 

470 

3369 

0 

7 

1 

1036 

45043 

0 

8 

1 

2002 

315213 

327904 


4.4 Sampling Of Random Genomes 

Theorem 13.II and Lemma [3.21 allow us to sample a (uniformly) random genome Q with given 
number of genes n, number of chromosomes h, and cycle structure c of the breakpoint graph 
G{P, Q). Namely, we define a Markov chain At as follows: 

• the states of At are genome classes Qn{h] c); 

• the probability of transition between Q„(h;c) and Q„_i(h;c + ej_i — Cj — Ct^j) (for 
any i >2 and 1 < j < i) is 

(z l)(c2—1 T 1 l)Ahi—1 (h, C T Gj 

{n-l)Mn{h-,c) 
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• the probability of transition between Qnih] c) and Qn-iih] c + ej_i — Cj) (for any i > 2 ) 

is 

(i — l)^(ci_i + l)Mn-i{h', c + Cj-i — Ci) 

(n - l)Mn(h;c) 

• the probability of transition between Qn{h]c) and Q„_i(h;c — Cj+i + Cj + Cj-j) (for 
any i >2 and 1 < j < i) is 

2 j(i j)(Cj “ 1 “ f)(Ci—j “ 1 “ 1 “ 1 “ ^ ^i +1 “ 1 “ ^i—j) 

(n - l)Mn(h;c) 

• the probability of transition between Qn{h] c) and Qn-i{h — 1 ; c + — e,) (for any 

i > 2 ) is 

{i — l)(cj_i + l)M„-i(h — 1; c + e^-i — e^) 

(n-l)M„(h;c) 

• the probability of transition between Qi(l;ei) and itself is eqnal to 1 ; 

• in the other cases, the transition probability eqnals to 0 . 

Lemma [3.21 implies that the Markov chain Ai is well-dehned. For any initial state Qn{h]c), 
the process after n — 1 steps comes into the terminal state Qi(l;ei), which consists of a 
single genome. 

To sample a random genome Q G Qn{h; c), we hrst sample a random path (Q„, Qn-i, • • •, Qi) 
starting at Qn = Qn{h] c) and ending at the termination state Qi = Qi(l;ei). We start with 
Q G Qi (i.e., Q is a genome with a single gene) and for every j from 1 to n — 1 , we randomly 
add a gene into Q snch that the resnlting genome belongs to Qj+i- By construction, at the 
end of this process the genome Q represents a uniformly random element of Qn{h] c). 


5 Discussion 


In the current work, we address the problem of enumeration of genomes with n genes that 
are at a given fc-break distance from a fixed unichromosomal genome. It is known that the 
fc-break distance between two geno mes can be computed in term s of cycle lengths in the 


breakpoint graph of these genomes (jAleksevev and Pevznerl . 1200811 . 


Our main result is the recurrent formula for the numbers M„(h;c) (and their generat¬ 
ing function) of breakpoint graphs with the cycle structure c of h-chromosomal genomes 
with n genes. We show connection between these numbers and various combinatorial ob¬ 
jects (such as Bell polynomials) and further compute numbers if(((n, d) of h-chromosomal 


generalize Hultman numbers ( 

Hr 

iltman. 

1999; 

Doimon and Labarrd. 2007: Bona and Flvnn. 

2009; 

Alexeev and Zoerall. 2014: 

Grusea and Labarre. 

20131. 


We believe that our approach can further lead to finding a formula for the numbers 
d/^(n, d) and then to evaluating the asymptotic distribution of the fc-break distances for a 
general k. Other open questions of interest include enumeration of genomes Q at a given 
fc-break distance from a hxed genome P, where (i) P is unichromosomal and Q is linear 
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multichromosomal (the case k = 2 was addressed by iFeiiao et al.l (120141) h or (ii) P and 
Q are both multichromosomal. Both questions may be addressed under the assumption 
of co-oriented or arbitrarily oriented genes. Dehning proper fc-breaks as those that are 
not {k — l)-breaks, we may ask similar questions for the graded (2, 3,..., fc)-break distance 
specifying the number of proper Fbreaks for each i = 2,3,..., k. Further assuming that 
proper fc-breaks for different k have different rates in the course of evolution, we may be 
able to estima t e the se rates from given (extant) genomes, using the technique proposed by 


Alexeev et al.l (120151) for k = 2,3. 
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Appendix. Mathematica Code 

Here we provide Wolfram Mathematica code for computing the functions S 2 , 

Gn{u-, Si, S2, . . . )> ^n(0; Si, S 2 , • • • )> ^n(M; Si, S 2 , • • • ): 

(^Implementation of the summands in the formula in Theorem 3.1*) 

L0[f_, n_] : = 

Sum[Sum[(i - l)*s[j]*s[i - j]*D[f, s [i - 1]], {j, 1, i - 1}], {i, 2, 
n}] 

Ll[f_, n_] := Sum[(i - 1) ~2*s [i] *D [f, s [i - 1]], {i, 2, n}] 

L2[f_, n_] : = 

Sum[s[i + l]*Sum[j*(i - j)*D[f, s[j], s [i - j]], {j, 1, i - 1}], {i, 

2, n}]; 

Ln[f_, n_] := Sum[(i - 1)*u*s [i] *D[f, s [i - 1]], {i, 2, n}] ; 

FG[n_, orient., multichr.] := {ff := {s[l]}-; Do[g := Lastfff]; 
f := l/k*(L0[g, n] + orient*Ll[g, n] + (1 + orient)*L2[g, n] 

+ multichr*Ln[g, n]); 

AppendToEff, Simplify[f]], {k, n}]; 
ff [[n]]} 

(*Implementation of function G_n(0;si,s2,...)*) 

G0[n_] := FG[n, 0, 0] 

(*Implementation of function G_n(u;si,s2,...)*) 

Gu[n_] := FG[n, 0, 1] 

(*Implementation of function F_n(0;si,s2,...)*) 

F0[n_] := FG[n, 1, 0] 

(*Implementation of function F_n(u;si,s2,...)*) 

Fu[n_] := FG[n, 1, 1] 
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